28 research outputs found

    A Joint Model for Unsupervised Chinese Word Segmentation

    Full text link
    In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired by the 'products of experts' idea, our joint model firstly combines two generative models, which are word-based hierarchical Dirichlet process model and character-based hidden Markov model, by simply multiplying their probabilities together. Gibbs sampling is used for model inference. In order to further combine the strength of goodness-based model, we then integrated nVBE into our joint model by using it to initializing the Gibbs sampler. We conduct our experiments on PKU and MSRA datasets provided by the second SIGHAN bakeoff. Test results on these two datasets show that the joint model achieves much better results than all of its component models. Statistical significance tests also show that it is significantly better than stateof- The-art systems, achieving the highest F-scores. Finally, analysis indicates that compared with nVBE and HDP, the joint model has a stronger ability to solve both combinational and overlapping ambiguities in Chinese word segmentation.,. ? 2014 Association for Computational Linguistics.EI

    Neural Chinese Word Segmentation with Lexicon and Unlabeled Data via Posterior Regularization

    Full text link
    Existing methods for CWS usually rely on a large number of labeled sentences to train word segmentation models, which are expensive and time-consuming to annotate. Luckily, the unlabeled data is usually easy to collect and many high-quality Chinese lexicons are off-the-shelf, both of which can provide useful information for CWS. In this paper, we propose a neural approach for Chinese word segmentation which can exploit both lexicon and unlabeled data. Our approach is based on a variant of posterior regularization algorithm, and the unlabeled data and lexicon are incorporated into model training as indirect supervision by regularizing the prediction space of CWS models. Extensive experiments on multiple benchmark datasets in both in-domain and cross-domain scenarios validate the effectiveness of our approach.Comment: 7 pages, 11 figures, accepted by the 2019 World Wide Web Conference (WWW '19

    An Effective Neural Network Model for Graph-based Dependency Parsing

    No full text
    Most existing graph-based parsing models rely on millions of hand-crafted features, which limits their generalization ability and slows down the parsing speed. In this paper, we propose a general and effective Neural Network model for graph-based dependency parsing. Our model can automatically learn high-order feature combinations using only atomic features by exploiting a novel activation function tanhcube. Moreover, we propose a simple yet effective way to utilize phrase-level information that is expensive to use in conventional graph-based parsers. Experiments on the English Penn Treebank show that parsers based on our model perform better than conventional graph-based parsers. ? 2015 Association for Computational Linguistics.EI313-322

    Max-margin tensor neural network for chinese word segmentation

    No full text
    Abstract Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability to alleviate the burden of manual feature engineering. In this paper, we propose a novel neural network model for Chinese word segmentation called Max-Margin Tensor Neural Network (MMTNN). By exploiting tag embeddings and tensorbased transformation, MMTNN has the ability to model complicated interactions between tags and context characters. Furthermore, a new tensor factorization approach is proposed to speed up the model and avoid overfitting. Experiments on the benchmark dataset show that our model achieves better performances than previous neural network models and that our model can achieve a competitive performance with minimal feature engineering. Despite Chinese word segmentation being a specific case, MMTNN can be easily generalized and applied to other sequence labeling tasks

    Max-Margin Tensor Neural Network for Chinese Word Segmentation

    No full text
    Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability to alleviate the burden of manual feature engineering. In this paper, we propose a novel neural network model for Chinese word segmentation called Max-Margin Tensor Neural Network (MMTNN). By exploiting tag embeddings and tensorbased transformation, MMTNN has the ability to model complicated interactions between tags and context characters. Furthermore, a new tensor factorization approach is proposed to speed up the model and avoid overfitting. Experiments on the benchmark dataset show that our model achieves better performances than previous neural network models and that our model can achieve a competitive performance with minimal feature engineering. Despite Chinese word segmentation being a specific case, MMTNN can be easily generalized and applied to other sequence labeling tasks. ? 2014 Association for Computational Linguistics.EI

    Characteristics Analysis of an Electromagnetic Actuator for Magnetic Levitation Transportation

    No full text
    In this article, an electromagnetic actuator is proposed to improve the driving performance of magnetic levitation transportation applied to ultra-clean manufacturing. The electromagnetic actuator mainly includes the stator with the Halbach array and the mover with a symmetrical structure. First, the actuator principle and structure are illustrated. Afterward, in order to select a suitable secondary structure and analyze the characteristics of the actuator, the electromagnetic characteristics of actuators with different secondary structures are analyzed by the finite element method (FEM). Analysis results show that the actuator adopting the secondary structure with a Halbach array will increase the electromagnetic force and working stability. The secondary with the three-section Halbach array is selected for the electromagnetic actuator. Then, the influence of secondary permanent magnet (PM) thickness on the electromagnetic force is analyzed by FEM. The results indicate that the increase in PM thickness will lead to increased electromagnetic force and decreased utilization ratio of PM. Finally, a prototype of an electromagnetic actuator is built, and experiments are implemented. The correctness of the theoretical analysis and the effectiveness of the electromagnetic actuator are verified by experimental results

    Carbon isotope and origin of the hydrocarbon gases in the Junggar Basin, China

    No full text
    The genetic type, source and distribution of hydrocarbon gases in the Junggar Basin were clarified through the carbon isotope analysis. Mature to post mature oil-type gas, mature to post mature coal-type gas, transition gas and biogas are identified in the Junggar Basin. Partly reversed order of carbon isotope of hydrocarbon gases in the Junggar Basin are attributed to one or several of the following reasons: mixing of oil-type and coal-type gases, mixing of coal-type gases of different source, mixing of coal-type gases of varied maturity, and microbial action. Three types of coal-type gases in the Junggar Basin are identified. The first type of coal-type gases characterized with high δ13C values of heavy hydrocarbon gases (δ13C2>−26.0‰) are the mature to high mature gases that are generated from Jurassic source rocks. The second type of coal-type gases characterized with low δ13C values of heavy hydrocarbon gases (δ13C2<−26.0‰) and wide maturity range, are generated from one or several source rocks in the Jurassic and the Wuerhe and Jiamuhe Formations of Permian. The third type of coal-type gases characterized with a wide δ13C value of heavy hydrocarbon gases and the high-post maturity are generated from the Carboniferous source rocks. Keywords: Carbon isotope, Hydrocarbon gas, Gas-source correlation, Junggar Basi

    List of antagonistic combinations found in our study.

    No full text
    <p>List of antagonistic combinations found in our study.</p

    Distribution of percentage of synergistic cases under various parameter sets for all combinations studied.

    No full text
    <p>Consistently synergistic and antagonistic combinations are marked, showing their stark contrast in number.</p
    corecore